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Abstract. Networked applications have software components that re- 
side on different computers. Email, for example, has database, pro- 
cessing, and user interface components that can be distributed across 
a network and shared by users in different locations or work groups. 
End-to-end performance and reliability metrics describe the software 
quality experienced by these groups of users, taking into account all the 
software components in the pipeline. Each user produces only some 
of the data needed to understand the quality of the application for 
the group, so group performance metrics are obtained by combining 
summary statistics that each end computer periodically (and auto- 
matically) sends to a central server. The group quality metrics usu- 
ally focus on medians and tail quantiles rather than on averages. Dis- 
tributed quantile estimation is challenging, though, especially when 
passing large amounts of data around the network solely to compute 
quality metrics is undesirable. This paper describes an Incremental 
Quantile (IQ) estimation method that is designed for performance mon- 
itoring at arbitrary levels of network aggregation and time resolution 
when only a limited amount of data can be transferred. Applications 
to both real and simulated data are provided. 
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1. MONITORING NETWORKED 
APPLICATIONS 

A stand-alone software application like a text pro- 
cessor resides entirely on one computer and is ac- 
cessed only by the people who use that computer. 
The components and users of a networked software 
application like email, though, span multiple com- 
puters. The database that stores current email mes- 
sages may reside on one (or more) computers, the 
database of previously read messages may reside on 
another computer, the mail processing software may 
reside on yet another computer, and the user in- 
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terface that allows email to be read and sent eas- 
ily may reside on many personal computers. That 
is, the components of the networked software, the 
users of the software, and the requests and actions 
by users are all distributed over the network. 

Networked services can fail in many ways, and the 
failures are often localized to a set of nodes that 
share a small fraction of the network infrastructure. 
Email transactions for only a subset of users may 
be delayed by server problems that disrupt a region 
of a network, or database accesses may be slow be- 
cause of heavy seasonal tasks that are performed 
by only some of the workers. Consequently, system 
administrators need to assess availability, reliability 
and performance with the structure of the network 
in mind, without specifying in advance which pieces 
of the network or which work groups to monitor to- 
gether. 

Monitoring the health of networked applications 
is challenging. First, the desktop computers or end 
user nodes that access the application may have only 
limited resources to allocate to processing metrics. 
At best, each end user may be able to compute lim- 
ited summaries of its performance. Moving all the 
performance data concerning all transactions from 
all end users to a dedicated server does not circum- 
vent the problem of weak end nodes because trans- 
ferring large amounts of data can place too high 
a load on the network. Thus, both the data and 
computational resources needed to compute quality 
metrics for networked software applications need to 
be distributed over the network. Finally, there are 
statistical challenges too. For example, users in the 
same building may have dissimilar tasks, so the ag- 
gregated performance data from that location look 
like a sample from a mixture with multiple modes 
and long tails rather than like a sample from a sim- 
ple parametric model. 

This paper describes an approach to monitoring 
networked applications that we developed in response 
to the needs of a business unit of Lucent Technolo- 
gies. To accommodate a wide range of statistical 
distributions, monitoring is based on tracking me- 
dians and upper quantiles rather than averages and 
higher-order moments. The nature of the specific 
problem, the constraints on computing that have 
to be addressed, and a high-level view of the ap- 
proach we took are described in Section 2; related 
approaches are discussed in Section 3. Our design 
has two parts: a lightweight sequential method that 
summarizes the performance data that are collected 



at each user's computer (Section 4) and a slight 
variant of the sequential method that further ag- 
gregates the user summaries over arbitrary subsets 
of the network and time (Section 5). (Using nearly 
the same algorithm at the end-user and server levels 
was one of the constraints specified by the engineers 
of our application.) Enhancements to achieve better 
accuracy are discussed in Section 6. Performance of 
the user-level algorithm is evaluated on simulated 
data (Section 7). Performance of the server algo- 
rithm that computes group-level metrics is evalu- 
ated on transaction time data collected from a group 
of corporate users and simulated work-group data 
(Section 8). Some ideas for generalizing the meth- 
ods are given in Section 9. 

2. MONITORING NETWORKED SOFTWARE 

Networked software provides applications such as 
email, database access, and voice and conferencing 
services to an enterprise. In a typical configuration, 
portions of the software live on servers and employ- 
ees of the enterprise access it using clients that live 
on their desktop computers. Monitoring agents are 
special clients that observe the performance details 
for each attempted and completed software trans- 
action: round trip time, server response time, band- 
width used, completion status, packet loss, total trans- 
action time, and so on. It is these performance data 
that describe the software quality that the user has 
experienced, and the data for a group of users de- 
scribe the software quality delivered to the group. 
The monitoring agents summarize the data and pe- 
riodically send the summaries to a central server 
that is responsible for monitoring the reliability and 
performance of the application across the network. 
Figure 1 illustrates the high-level flow of data and 
summary records in the monitoring application. In 
these applications, reliability problems are failures 
of the network, servers and applications to deliver 
adequate performance to the end users. Problems 
may not be exhibited through complete failure of 
the infrastructure, but rather through soft metrics 
such as overly long response times on high volume 
transactions. 

To save space on the end user's computer, the 
monitoring agents summarize the performance data 
with a fixed-length record, one record for each trans- 
action type, that is updated with new performance 
data whenever the networked application is used. 
Often the record is too small to hold all the raw 
data, and in this case it must hold summaries of the 
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data rather than the full set of data values. Period- 
ically, say at the end of every hour, the summary 
record is sent to a server. The server then aggre- 
gates the summary records across locations, work 
groups, business units and longer periods of time as 
required by system administrators investigating re- 
liability and performance issues. Server records are 
also fixed-length. 

Figure 2 shows a histogram of times to complete 
email transactions with SMTP or POP3 servers ag- 
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Fig. 1. Data flow for monitoring networked software. 
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Fig. 2. Times to complete 41,928 email transactions over a 
one-month period. 



gregated over 15 employees in a one-month period. 
The shortest transaction time is 1 ms, while the 
longest is 2.33 x 10 5 ms or 233 seconds. No stan- 
dard transformation of these data induces normality 
or even symmetry. Moreover, as would be expected 
when aggregating over agents and times, the his- 
togram for the work group is multimodal. 

Summarizing such data quickly and reliably while 
preserving as much information as possible about 
the entire distribution is especially challenging be- 
cause the transaction times are obtained sequen- 
tially across a group of end users, there is not enough 
memory to store all the data for many metrics on 
many transaction types before they are analyzed, 
and the data cannot be reduced to a small set of suf- 
ficient statistics by appealing to a parametric family 
of distributions. Simple statistical summaries such 
as the mean and variance are statistically inade- 
quate (unfortunately so, since they are inexpensive 
to compute). Under these circumstances, we prefer 
to summarize the distribution in terms of its median 
and tail quantiles. 

3. INCREMENTAL QUANTILES 

In statistical notation, agent a (the agent moni- 
toring your computer, say) sees a multivariate data 
stream 

X a = {X as t, s = 1, . . . ,S,t = 1, 2, . . .}, 

where X as t is the value of the sth metric (response 
time, e.g.) on the tth transaction (email access, e.g.) 
seen by agent a. 

Users of the software application are typically or- 
ganized in multiple hierarchies according to geographic 
location and business unit. The interesting subsets 
of agents correspond to these hierarchies or to groups 
defined by common network infrastructure. Time 
adds another dimension, and the interesting periods 
may be five-minute periods, hours, days or months 
depending on the purpose of the analysis. Often, 
the agent hierarchy and time resolution are chosen 
dynamically as an analyst explores the data. But 
whatever the choices, the analyst is to be provided 
quantiles for the aggregated data {X ast : a G A, s € 
5, t 6 T} where A, S and T are subsets of agents, 
metrics and time, respectively. Quantile estimates 
for the aggregate data are produced from records 
that are periodically provided by agents. Each of the 
agent records in turn contains a set of quantile es- 
timates that were produced by that agent using the 
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same kind of sequential updating algorithm that the 
server uses. 

Sequential quantile estimation, which is called in- 
cremental quantile estimation in the computer sci- 
ence literature, is not a new topic. Robbins and 
Monro (1951) introduced the idea of stochastic ap- 
proximation for quantile estimation, for example. 
Munro and Paterson (1980) then used it for sorting 
and selection with limited memory, Tierney (1983) 
used it for monitoring computer simulations, and 
Chen, Lambert and Pinheiro (2000) used it for mon- 
itoring nonstationary user profiles. Stochastic ap- 
proximation is best suited for continuous data be- 
cause it requires an estimate of the density near the 
quantile. The data in our application, such as packet 
sizes, are often discrete and can often have preferred 
values and spikes, so any continuity assumption is 
suspect. Liechty, Lin and McDermott (2003) pro- 
posed an algorithm to estimate a single quantile 
by maintaining a buffer of data values that is in- 
tended to bracket the desired quantile. Their algo- 
rithm works well for simulated data, but it tracks 
only a single quantile. McDermott, Babu, Liechty 
and Lin (2003) extended the algorithm to track a 
prespecified set of quantiles. The Incremental Quan- 
tile (IQ) method represents a different emphasis, 
on estimating distribution functions whole and 
combining those estimates for a general data-analytic 
tool. Future numeric comparisons with alternative 
algorithms such as those referenced above may lead 
to improved estimates within this general approach. 

Computer scientists have considered sequential 
quantile estimation without density estimates, but 
with the twist that reported quantiles must be ob- 
served data values. See Manku, Rajagopalan and 
Lindsay (1998) and Greenwald and Khanna (2001, 
2004). Simply stated, these methods attempt to keep 
"typical" values, so that the goal is perhaps more 
akin to sorting the data than to estimating an un- 
derlying distribution. Our application does not have 
the constraint that quantile estimates must be ob- 
served data values. The advantage of the computer 
science methods is that they guarantee precision to 
within a prespecified error on the probability level of 
the quantile estimate. Such guarantees can be useful, 
but much less so when interest is in tail quantiles. 
For example, it may be adequate to estimate the me- 
dian to within the interval defined by the 0.49 and 
0.51 empirical quantiles, but a fixed ±0.01 error on 
the probability level is nearly useless for estimating 



the 0.999 quantile. In our application, interest cen- 
ters on the accuracy of the estimated quantile value 
itself rather than its probability level. 

Three simple principles underlie our approach to 
sequentially estimating and aggregating quantiles: 

1. Empirical distributions are appropriate for all sorts 
of numerical data. 

2. Averaging cumulative distribution functions (CDFs) 
is easy. 

3. Converting a CDF to a set of quantiles and vice 
versa is straightforward. 

To aggregate sets of quantiles provided by many 
agents, we collect a batch of agent records until a 
fixed number has been reached, and then convert the 
quantiles on the records to empirical CDFs and the 
quantile record at the server to another CDF. Then 
we average the CDFs with appropriate weights and 
compute quantiles of the average CDF to complete 
one round of the aggregation algorithm. Of course, 
the way that a set of quantiles is converted to a 
CDF may affect the quality of the final estimates, 
as does the choice of the probability levels for the 
quantiles in each set. This procedure is simple, but 
it seems not to have been used previously. Details 
and performance comparisons are provided in the 
remainder of this paper. 

4. IQ AGENT ALGORITHM 

4.1 Requirements for Aggregation Algorithms 

The monitoring architecture requires two types of 
algorithms, one for the agent and one for the server. 
The agent algorithm should require only one con- 
tinuous pass through the data stream and should 
be lightweight in both memory and CPU usage be- 
cause many copies of the algorithm (one for each 
transaction type for each networked application and 
monitored quantity) will run in the background on 
the desktops of corporate users. Hourly records pro- 
duced by the algorithm should be fixed-length to 
simplify the design and small to reduce the burden 
of transmitting them to the server for further aggre- 
gation. 

Figure 3 depicts the major steps in the IQ agent 
algorithm. A data buffer D at the agent holds the 
most recent observations from a stream {X\,X2, ■ ■ ■}■ 
A quantile buffer Q corresponding to probability 
values Pq = (pi, ■ ■ ■ ,Pm) holds the quantiles Q = 
(Qi, . . . ,Qm) estimated from the data that have al- 
ready been processed. When D fills with data, it is 
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first used to update Q and then it is cleared in or- 
der to accumulate the next batch of data from the 
stream. When a report is required, a predetermined 
subset of Q is provided to the server as a summary 
of the entire stream processed by the agent. Notice 
that more quantiles may be tracked in the Q-buffer 
than are reported in the agent summary to improve 
the accuracy of the agent record. 

At the server, a second algorithm summarizes agent 
records by estimating quantiles of the mixed distri- 
bution of their combined data. Like the agent al- 
gorithm, the server algorithm should be lightweight 
and operate in one pass through a set of agent records. 
Ideally, the server algorithm should create records of 
the same form gent records to keep the design 
simple and to provide a uniform method for aggre- 
gating in stages up the levels of a hierarchy. 

Details of the agent algorithm are provided in the 
remainder of this section. The server algorithm is 
discussed in Section 5. 

4.2 Updating the Q-Buffer 

Suppose that T data values have been processed 
with the IQ algorithm so that Q holds estimated 
quantiles of the set {X\, . . . ,Xt}- Then the data 
buffer D is filled with the next N values, {Xt+i, • • • , 
Xt+n}- When full or at prespecified times, D is con- 
verted to an empirical CDF Fd(x), Q is converted 
to a CDF Fq(x), and a weighted average of the two 
CDFs is computed. Quantiles of the average CDF 
are used to update Q. 

Linearly interpolating Fq models the data as uni- 
formly distributed between adjacent quantiles in Q, 
which is reasonable if no other information is avail- 
able and the tails of the data are not overly long. If a 
variable such as round-trip time or transaction time 
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Fig. 3. Major steps in the IQ agent algorithm. 



has a long right tail, then accuracy is improved by 
applying the algorithm to logged data or by using 
nonlinear interpolation as described in Section 6. 

The updating algorithm has four basic steps, il- 
lustrated in Figure 4 and detailed as follows. 

For each x £ Q U D : 

1. Compute the CDF of Q (Figure 4, left panel) as 



(1) F Q {x) 



0, if x < Qi, 

1, if x > Qm, 

interp(x, Q m , Q m+ i , p* m , p* m+1 ) , 

if Qm < x < Qm+i, 
m = l,...,M-l, 



where interp interpolates the given points as 

interp(x,x ,xi,p ,Pi) 

x-x 

= Po + yp\ -Po) 

xi -x 

(see Section 6 for nonlinear interpolation) and 

p* m = median(p m , 0.5/T, 1 - 0.5/T), 

which is p m trimmed to the interval [0.5/T, 1 — 
0.5/T]. Trimming imposes jumps in the CDF at 
the minimum (Qi) and maximum (Qm) data val- 
ues, so the minimum and maximum over all data 
values processed so far are kept in Q. This means 
that half of the 1 /T mass associated with an ex- 
treme value (minimum or maximum) is allocated 
to an interval strictly less extreme than the ob- 
served value, and the other half of the 1/T mass 
is allocated to the extreme value itself. It may 
be reasonable to replace the jump with a smooth 
extrapolation, but then some extreme quantiles 
would extend beyond the range of the observed 
data, which we choose to avoid. 
2. Compute the empirical CDF of D (Figure 4, cen- 
ter panel) and its left-continuous value as 



(2) 



Fd{x) 



|D < x\ 
|D| 

|D<x| 
IDI 



where | • | indicates the number of elements in the 
indicated set. 
3. Compute the weighted average CDF (Figure 4, 
right panel) and its left-continuous value as 



F ± (x) 



T-F Q {x) + N-F±{, 
T + N 
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For each p m € Pq : 

4. Compute the updated quantile, Q m (Figure 4, 
arrows in right panel), by inverting the weighted 
average CDF as follows. Find bracketing values 

x + = min {Q U D}, 

F+> Pm 

x~ = max {Q U D} 

F~<p m 

and set 

(3) Qm = \ X ~L^ n , + [ [f= X ~' 
I px + (1 — p)x^, otherwise, 

where p = [F + (x + ) -p m ]/[F + (x + ) - F~ (x~)} for 
linear interpolation. The nonlinear case is dis- 
cussed in Section 6. 

Finally, refill Q with the updated quantiles and clear 
D in order to resume accumulating new data from 
the stream. 

The quality of IQ quantile estimates depends on 
the quality of the estimate of the CDF F (i.e., F ± ) 
from which they are computed, which in turn de- 
pends on the buffer sizes and probability levels Pq. 
In particular, the assumed F is linear between dis- 
tinct adjacent quantiles (or linear on a transformed 
scale), and this may be a better assumption over 
small intervals than over long intervals. Thus, keep- 
ing more quantiles in Q is desirable, even if only a 
few quantiles can be reported ultimately. 



When all the data have been processed, an agent 
record can be formed to summarize the results. The 
agent record is (T a ,R a ) where T a is the total num- 
ber of observations processed and R a is typically 
a fixed subset of the quantiles in Q, including the 
minimum and maximum values. However, if T a is 
smaller than the record size, then all the raw data 
values are inserted into R a . 

4.3 An Example of IQ Updating 

As an example, consider the transaction time data 
shown in Figure 2. Empirical quantiles (EQ) were 
computed in the standard way by sorting all the 
test data, and IQ quantiles were computed using 
buffer sizes |D| = jQ| = 100 and linear interpolation 
on the logged data. However, even the logged data 
remained long-tailed. The probabilities in Pq were 
and 1 (corresponding to the minimum and max- 
imum data values) and 98 probabilities uniformly 
spaced from 0.0025 to 0.9975 on the log(p/(l — p)) 
scale, so that more quantiles are devoted to tail 
probabilities. 

Table 1 shows the IQ and EQ estimates, their dif- 
ferences, and approximate EQ standard errors com- 
puted by plugging a local density estimate into the 
asymptotic standard error formula. The IQ estimates 
reproduce the EQ values well with differences never 
more than two standard errors of the empirical quan- 
tiles. 
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Fig. 4. Quantile updating with Q of size 5 with probabilities Pq = (0,0.25,0.5,0.75,1) and D of size 10. Q has been updated 
twice, so T — 20. The left plot shows Fq before updating where vertical segments indicate the stored quantiles. The middle plot 
shows the ten data values in D as ticks on the horizontal axis and the empirical CDF Fd ■ The right plot shows the updated 
F (a weighted average of Fq and Fd )■ The updated quantiles for Q are shown as ticks along the horizontal axis. 
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Fig. 5. Major steps in the IQ server algorithm. 

5. IQ SERVER ALGORITHM 

The next task is to merge sets of agent quantiles to 
estimate performance for a set of users, or to merge 
server quantiles to obtain estimates for combined 
work groups or longer periods of time, for exam- 
ple. To be specific, this section describes merging 
of agent records, but the ideas also apply to higher 
levels of aggregation. Figure 5 illustrates the major 
steps: agent summary records are placed into a data 
buffer D; when D is full it is used to update a quan- 
tile buffer Q; once all records have been processed, 
a subset of Q is selected to form a summary record 
of the aggregation. 

As in the agent algorithm, Q holds the approxi- 
mate quantiles Q = (Q±, . . . , Qm) with correspond- 
ing probability levels Pq. These quantiles are a sum- 
mary of all agent records that have been processed 

Table 1 

IQ estimated quantiles compared to empirical quantiles (EQ) 
of the 41,928 transaction times illustrated in Figure 2 



Quantile 


0.5 


0.75 


0.9 


0.95 


0.99 


0.995 


IQ 


190 


323 


821 


1338 


4674 


5154 


EQ 


189 


320 


826 


1280 


4807 


5147 


Difference 


1 


3 


-5 


58 


-133 


7 


2 x s.e.(EQ) 


1.3 


5.4 


32 


72 


134 


130 



For IQ, D and Q both have size 100. Absolute differences 
between IQ and EQ are less than two standard errors of the 
empirical quantiles. 



so far. When Q is updated, two ancillary quantities 
are also updated — Na, the total number of agent 
records that have been processed and T, the total 
number of data values represented by the Na agents. 

D holds the next set of agent records to be in- 
cluded in the aggregation, some of which contain 
quantiles and some of which may contain raw data 
values. The combined set of raw data values over 
all records in D is denoted by X = {Xx, . . . ,Xn}- 
A quantile record from agent a is denoted (T a ,R a ), 
where T a is the number of values represented and 
R a = (i? a ,i < ■ ■ < Raj) is a vector of I quantiles 
with probability levels Pr, including both and 1. 

Updating Q at the server is similar to updating 
Q at the agent. Both D and Q are converted to 
CDFs, the CDFs are averaged, and then the average 
is inverted to update Q. 

For each x € Q U D : 

1. Compute Fq(x) using (1). 

2. Compute the CDF, F a (x), of each set of agent 
quantiles using (1) with R a and P_r in place of 
Q and Pq, respectively. 

3. Compute the empirical CDF, F^(x), of the data 
values X C D and its left-continuous value, F^(x), 
using (2) with X in place of D. 

4. Compute the weighted average CDF and its left- 
continuous value as 

TF Q (x) + NF±(x) + Y. a TaF a {x) 



F ± (x) 
For each p m € Pq : 



T + N + J2 a T a 



5. Compute the updated quantile estimate Q m by 
inverting F^(x) using (3) where the definitions of 
the bracketing values x + and x~ are unchanged. 

Finally, refill Q with the updated quantile estimates, 
clear D, and resume accumulating new records. 

When the full set of agent records has been pro- 
cessed, a server record is produced to summarize 
the result. The server record consists of T, Na and 
a subset of the quantile estimates in Q, including 
the minimum and maximum values. A set of server 
records of this form can be aggregated further by ap- 
plying the IQ server algorithm a second time. Aggre- 
gation can thus proceed hierarchically, as Section 8 
illustrates. 

6. ALGORITHM ENHANCEMENTS 

Increasing the sizes of D and Q improves accu- 
racy. A larger D allows the subtle features of the 
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underlying distribution to be better represented in 
the empirical CDF before folding into Q. A larger Q 
reduces interpolation errors because interpolation is 
used over shorter intervals. 

If memory cannot be increased, it is sometimes 
desirable to sacrifice accuracy in the central quan- 
tiles for improved accuracy in the tails. This trade- 
off can be achieved by manipulating the probability 
levels Pq associated with quantiles in Q. Generally, 
if good accuracy is desired for a quantile with proba- 
bility level p, then it is helpful for Pq to place prob- 
ability values more densely near p. But focusing on p 
leaves fewer probabilities elsewhere with the result 
that, while accuracy of the pth quantile improves, 
accuracy of other quantiles degrades. We have used 
probability levels that are either uniformly spaced 
between and 1 or uniformly spaced on the scale 
log(p/(l —p)), as in the example in Section 4.3. 

Tail quantile accuracy may also be improved by 
applying nonlinear interpolation to Q, which is equiv- 
alent to applying linear interpolation to a transfor- 
mation of Q. In most applications it is not fea- 
sible to determine an optimal transformation be- 
cause the shape of the distribution is unknown, so 
it is often desirable to choose a transformation that 
performs well over a wide variety of datasets. The 
performance study in Section 7 compares uniformly 
spaced probability values and linear interpolation 
with logit spaced probability values and logit inter- 
polation, which is defined by taking 

±nterp(x,x ,xi,p ,pi) 

= g~ 1 [g{po) + (s(pi) -gipo))— — — J 

in (1) and 

U) _ g(F + (x + ))-g( Pm ) 

{> 9 g(F+(x+))-g(F-(x-)) 

in (3), where g(p) = log(p/(l — p)) is the logit func- 
tion and g~ 1 (x) = 1/(1 + exp(x)) is its inverse. In 
principle, g should be chosen so that g(F(x)) is 
nearly linear, but F is unknown. Although logit in- 
terpolation may not be optimal, it should be better 
than linear interpolation if exponential tails are ex- 
pected. 

7. PERFORMANCE OF THE AGENT 
ALGORITHM 

The core of our network monitoring methodology 
is the IQ agent algorithm that computes incremen- 
tal quantiles from raw data. To study its perfor- 
mance, we simulated it with D and Q of size 41 each. 



Linear interpolation with uniform probability values 
(shown as inner ticks along the top axes in Figure 6) 
and logit interpolation with logit probability values 
(inner ticks along the bottom axes) were used in the 
simulation. The logit probabilities are actually at 41 
convenient round values that are approximately uni- 
formly spaced on the logit scale. Three distributions 
are considered: the standard normal, standard log- 
normal and beta(9, 2), which has a very long left tail 
and sharp rise to a mode in the right tail. Quantiles 
were estimated after 1000 and 10,000 independent 
observations, which implies that the buffers were 
emptied 24 and 243 times, respectively, and then 
one more time at the 1000th and 10,000th observa- 
tions, respectively. 

Simulated performance is measured by the ratio of 
the root mean squared errors (RMSEs) of the IQ and 
empirical quantile (EQ) estimates where the RMSEs 
are computed over 1000 runs of the simulation. The 
horizontal axes in Figure 6 are on the logit scale to 
show the behavior of the extreme quantiles. 

Not surprisingly, Figure 6 shows that uniformly 
spaced probability values and linear interpolation 
perform poorly in the tails of the normal distribu- 
tion. At N = 1000 and p = 0.005, the IQ RMSE 
is about four times the EQ RMSE. Moreover, rel- 
ative performance degrades with N. By ./V = 10,000 
the IQ RMSE is about 20 times larger than the 
EQ RMSE. Plots not shown here suggest that this 
degradation is due to the bias in the IQ estimates 
which does not diminish with N. Similarly, Figure 6 
shows that linear interpolation and uniformly spaced 
probability values do not provide good performance 
in the long right tail of the log-normal and the long 
left tail of the beta, and that performance relative 
to the EQ estimates degrades with N, again due to 
bias. At the 0.99 quantile of the log-normal, the ra- 
tio of RMSEs is about 15 for N = 1000 and about 
75 for N = 10,000. A similar pattern is seen near the 
0.01 quantile for the beta distribution. That is, when 
the uniform scale does not tame the tails of a dis- 
tribution sufficiently, the IQ estimates with uniform 
probabilities and linear interpolation may be notice- 
ably worse than the empirical quantiles. The RMSEs 
of the IQ and EQ estimates are nearly identical for 
the most extreme quantiles under all distributions 
because these are computed from the minimum and 
maximum data values, which the IQ algorithm keeps 
in Q. 

For logit probability values and logit interpola- 
tion, there are ripples in the ratio of IQ RMSE 
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to EQ RMSE in the center of the log-normal and 
beta distributions. These ripples become more pro- 
nounced with increasing N. The low points of the 
ripples occur for quantiles that are kept in Q, while 
the high points are between adjacent quantiles. De- 
grading relative performance with increasing N is 
again due to the bias in the IQ estimates that oc- 
curs in regions where the density changes rapidly 
with respect to the logit-spaced probability levels. 
But in all cases, the IQ RMSE is within a factor of 
2 of the EQ RMSE even though the IQ algorithm 
never computes with more than 82 data values while 
the empirical quantiles require knowing all 1000 or 
10,000 data values at once. In this sense, the IQ al- 
gorithm produces usable estimates over a range of 
distributions. 

A second simulation experiment with log-normal 
data, logit-spaced p's, logit interpolation, and D- 
and Q-buffers of size 1000 was run to focus on the 
behavior of IQ estimated quantiles for large N. The 
quality of the IQ estimates was evaluated at N = 
W K , for k = 3, 4, . . . , 7. For all values of k, the IQ 
RMSE tracked the EQ RMSE closely in the mid- 
dle of the distribution. For instance, the ratio of IQ 
RMSE to EQ RMSE averaged over the middle 95% 
of the log-normal, pG (0.025,0.975), increases from 
1.00000 at N = 10 3 to 1.01338 at N = 10 7 , an in- 
crease of only about 1%. The ratio of IQ RMSE to 
EQ RMSE does increase more with N in the tails. 



For example, at p = 0.99 the ratio increases 31.5% 
as N increases from 10 6 to 10 7 , but even this bias 
would not make the IQ estimates unusable in our 
application. Thus, the IQ estimates are adequate if 
the probability levels for the Q and interpolation 
schemes are suitable. 

8. PERFORMANCE OF THE AGGREGATED 
GROUP QUANTILES 

Networked software monitoring focuses on the quan- 
tiles of the performance experienced by groups of 
users. We explore the behavior of the aggregated 
quantiles that are computed by the IQ server algo- 
rithm in this section. 

Transaction Time Data. The data shown in Fig- 
ure 2 represent 41,928 email transactions for 15 cor- 
porate users over one month. Hourly sets of quan- 
tiles were computed for each user, and the hourly 
user quantiles were aggregated to produce hourly 
records for the group of 15 users. Finally, the hourly 
group records were aggregated to produce daily quan- 
tile estimates for the group. 

The IQ agent (user) and server (group) algorithms 
both used D- and Q-buffers of size 100 with uni- 
formly spaced probabilities Pq and linear interpola- 
tion on log transaction times. Each agent and server 
record contained only 11 quantiles corresponding to 
probability levels P R = {0, 0.05, 0.10, 0.25, 0.50, 0.75, 
0.90,0.95,0.99,0.999,1}. 
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Fig. 6. Performance of IQ on three distributions. Logit p's combined with logit interpolation perform well in the tails but 
generally not as well as uniform p 7 s and linear interpolation for the center of the distribution. As the sample size increases 
from 1000 to 10,000, IQ performance degrades relative to empirical quantiles because the IQ estimates are biased whereas 
empirical quantiles are not. 
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Figure 7 compares the incremental and empirical 
quantiles for a week of hourly group records and Fig- 
ure 8 compares these quantiles for a month of daily 
aggregates. In each figure, the lower line tracks em- 
pirical medians and the upper line tracks empirical 
0.9 quantiles. Darker vertical segments connect em- 
pirical quantiles to the corresponding IQ estimates, 
so longer lines correspond to poorer IQ estimates. 
Figure 7 represents results after two stages of pro- 
cessing: one at the agent and one at the server. Fig- 
ure 8 shows results after an additional application of 
the IQ server algorithm to compute daily quantiles. 

The IQ estimates track empirical quantiles reason- 
ably well, especially at the daily level where most 
differences are imperceptible. At the hourly level, 
some errors in the 0.9 quantiles are noticeable, but 
this reflects the limits on the accuracy that can be 
achieved when each agent record consists of only 11 
quantiles. Table 2 reports the fraction of cases in 
which incremental quantiles were within 10% of the 
correct empirical values. 

Simulated Inhomogeneous Agents. The data from 
different users of networked applications are typi- 
cally not homogeneous because their network en- 
vironments and software usage differ. Here we re- 
port the results of a simulation that gives some in- 
sight into how the IQ algorithm responds to outly- 
ing users. These results also address the question of 
whether the order in which the records from inho- 
mogeneous agents are received matters, given that 
the server processes records sequentially. The fol- 
lowing simulation is meant to be realistic, but only 
exemplary because it is not possible to test or even 
specify the full range of conditions that could be en- 
countered in a real network monitoring application. 

In the simulation, agent records of length I = 10 
(i.e., 10 quantile estimates, not raw data values) 
are constructed for 1000 agents independently: 99% 
of the agents are nominal and 1% are outlying. In 
either case, the simulated record R a for agent a 
(a = 1, . . . , 1000) is formed as follows. First an i.i.d. 
sample of T a = 1000 values is drawn from a log- 
normal (base 10) distribution: 

X o dm o ~10 N ^^), 



where 



t = h 



,1000. 



The agent record consists of / = 10 empirical quan- 
tiles R a = (i? a ,i < • • • < Ra,io) corresponding to prob- 
abilities of 0, 1 and eight values equally spaced be- 
tween 0.005 and 0.995 on the logit scale. The medi- 
ans m a of the logged agent distributions are inde- 
pendent and log-normally distributed: 

m a \M a ~l^ M ^\ 



0, with probability 0.99, 
2, otherwise. 



Nominal agents are those with M a = 0; outliers are 
those with M a = 2. We set V 1 = V 2 = 0.0924, result- 
ing in 



Q(0.99|M a ) 
Q(0.01|M a ) 



100 for M a = and 2, 



where Q(p\M a ) is thepth quantile of LY aji |M a ]. That 
is, the central 98% of nominal data cover two orders 
of magnitude, as do the central 98% of outlying data. 
Furthermore, with M a taking values of and 2, the 
outlying data are centered two orders of magnitude 
larger than the nominal data. The complete mixture 
covers about four orders of magnitude between its 
0.01 and 0.999 quantiles. Note, however, that agents 
are not homogeneous. Both nominal and outlying 
agents have random medians and thus each agent 
record summarizes a different distribution of data. 
Agent records constructed using empirical quantiles 
as above do not have any errors associated with 
agent-level IQ estimation. Thus, this simulation only 
considers performance of the server-level algorithm. 

At the server, the D-buffer is sized to hold 100 
length-10 records and the Q-buffer holds 1000 quan- 
tile estimates with probabilities of 0, 1 and 998 val- 
ues equally spaced between 10 -6 and 1 — 10~ 6 on 
the logit scale. Interpolation uses g(-) = logit(-) as 
described in Section 6. 

Figure 9 plots the ratio of IQ RMSE to EQ RMSE 
after processing the agent records representing, in 
aggregate, 1000 data values for 1000 agents, or one 
million data values in all. The plot has two curves, 
one for aggregation on the nominal data scale (solid 
line) and one for aggregation of logged agent records 
(dotted line). Logit interpolation is used in both 
cases. The most obvious feature is that transform- 
ing the data to the log scale improves performance, 
especially in the central part of the distribution. In 
fact, the worst relative performance occurs near the 
median when aggregating nominal data, but with 
logged data the IQ median estimate has the same 
RMSE as the empirical median. 

Both curves in Figure 9 show that the far up- 
per tail, corresponding to the 1% of outlying agents, 
is estimated with essentially the same accuracy as 
empirical quantiles. This is not a trivial result be- 
cause, even with logged data, each agent describes 
a different distribution and the complete mixture 
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Fig. 7. Hourly quantiles of email transaction times over a one-week period. Lines track empirical 0.5 and 0.9 quantiles while 
vertical bars connect empirical quantiles to IQ estimates in order to highlight differences. Two rounds of IQ were performed: 
first, agents prepared hourly records; then the server combined agent records to obtain the aggregate hourly results shown. 
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Fig. 8. Daily 0.5 and 0.9 quantiles of email transaction times over a one-month period. IQ results are obtained from 
aggregating hourly records such as displayed in Figure 7, which corresponds to Week 3 in this figure. 



Table 2 

Fraction of cases in which IQ estimates are within 10% of the empirical 
quantiles for email transaction times 



Aggregation 


Number 


Fraction within 10% 


level 


of cases 


0.5 quantile 0.9 quantile 


Hourly 


768 


0.999 0.929 


Daily 


32 


0.969 1.000 
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is not Gaussian. Some additional experimentation 
showed that nominal-scale performance in the cen- 
tral portion of the distribution can be improved by 
increasing the agent record length above 10. We 
chose length- 10 records, however, because this closely 
matched the stringent requirements imposed for mon- 
itoring networked applications. 

As a second experiment, we fed the agent records 
to the server algorithm sorted by increasing values of 
their log-medians m a rather than in random order. 
In particular, most outlier records were processed af- 
ter nearly all nominal records had been processed. 
Remarkably, performance curves (not shown) for ag- 
gregating the ordered records are indistinguishable 
from the curves of Figure 9. In this experiment, at 
least, it made no difference whether inhomogeneous 
agent records were presented in random or sorted 
order. 

9. DISCUSSION 

Most corporate software is highly reliable, so it 
is only the tail behavior (and, hence, tail quantiles) 
of performance data that are of interest. Moreover, 
software performance and reliability are often mon- 
itored for groups of users, not individual users, par- 
tially because any one user may access the soft- 
ware so infrequently that statistics based on individ- 



ual users are too unreliable to be interesting. Thus, 
monitoring the reliability and performance of net- 
worked applications naturally leads to distributed 
monitoring and aggregating quantiles over groups 
of users and time. We have presented one approach 
to estimating aggregated quantiles from distributed 
monitoring data, and shown that it can give trust- 
worthy estimates using limited agent and network 
resources even if the agents are not homogeneous 
and their records arrive in what seems to be per- 
verse (smallest first) order. 

While this paper has focused on networked soft- 
ware, the need for estimating aggregated quantiles 
for highly reliable business systems arises in other 
contexts, too. Examples include communications soft- 
ware that routes calls to appropriate support staff in 
technical help centers and package tracking software 
used by delivery services to route shipments at way- 
points in a network of transit sites. Each of these ap- 
plications can generate huge amounts of data such 
as transaction time, size and completion status that 
can be used to monitor performance and reliability. 
For example, the call center for one computer man- 
ufacturer has on the order of 10,000 agents that to- 
gether handle millions of transactions per day, each 
of which can, in principle at least, be monitored for 
setup and response time. The transactions for an 
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Fig. 9. Server performance on inhomogeneous agents. The server processes 1000 length-10 agent records, each of which 
summarizes 1000 data values. Marginally, a data value from the group of agents follows a mixture of log-normal distributions 
that covers four orders of magnitude. Agent records are processed at the server in batches of 100 using a Q-buffer of length 
1000. The resulting RMSEs are less than twice those of empirical guantiles for nominal-scale updating and less than 120% of 
the EQ RMSEs with log-scale updating. Results are averaged across 500 simulation runs. 
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agent can be measured, quantile records computed, 
and then aggregate performance by work group or 
location can be estimated. 

This paper has shown that IQ estimation pro- 
vides a way to track performance at several levels 
of aggregation over time, agents or space simultane- 
ously, where the set of agents, portion of the net- 
work, and time period of interest are not necessar- 
ily fixed in advance. Although IQ estimation can 
be applied whenever multiple quantiles are needed, 
it is probably most useful when interest focuses on 
tail quantiles or the data are not expected to fol- 
low a parametric distribution. This paper shows that 
IQ estimates provide useful information throughout 
the range of the data if logit probability values are 
combined with logit interpolation. This is especially 
important for evaluating the reliability and perfor- 
mance of networks and other systems that nearly al- 
ways perform well. For such systems, only tail quan- 
tiles are of interest. 

The IQ method can be characterized as "quick 
and dirty" in the sense that we work under the 
tight computational constraints imposed by the ap- 
plication, notably the fixed sizes of buffers and sum- 
mary records and the desire for simplicity. We are 
also willing to proceed with a method whose con- 
ventional statistical properties (e.g., bias and con- 
vergence) are not yet fully understood, partially be- 
cause standard sampling and distributional assump- 
tions seem unlikely to hold in the motivating ap- 
plications. As would be expected with a quick and 
dirty method, there are limitations to the result- 
ing estimates. For example, they assume that inter- 
est centers on aggregate performance over the entire 
workgroup or reporting interval rather than on the 
details of the performance experienced by individual 
users during the interval. Similarly, IQ estimates do 
not take account of trends over time or time-of-day 
patterns, such as the difference between peak and 
off-peak hours. It would be straightforward to allow 
trends by incorporating exponential weighting into 
the averaging steps for updating Q. Time-of-day or 
day-of-week patterns could be incorporated by start- 
ing each reporting period with a Q specific to the 
time period instead of an empty buffer or one that 
is continuously updated over all time periods. These 
can also be accommodated by defining the duration 
over which a Q-buffer is filled. Longer periods give 
more stable estimates, but may include data with 
dissimilar distributions. 



On a mixed distribution with spikes, some empir- 
ical quantiles will be exactly correct with high prob- 
ability in large samples. IQ estimates do not behave 
as well, but if the spikes are known in advance, then 
the IQ algorithm could be easily modified to count 
hits at the spikes separately and process the remain- 
ing data through the IQ algorithm. There are, for 
example, preferred packet sizes in network data that 
cause spikes in the size distributions, but these are 
known in advance and so can be planned for. 

Finally, the spacing in the probability values af- 
fects the performance of IQ estimates, but our al- 
gorithm makes no attempt to adjust the probability 
values over time. An algorithm that adjusted the 
probability values to minimize interpolation error 
associated with Fq would perform better, but prob- 
ably not be as quick or straightforward. A simpler 
approach would be to collect some training data to 
get a ballpark estimate of the shape of the distri- 
butions of interest and use that shape to inform 
the choice of probability values for Q. If extreme 
tails are of interest, it may help to gradually extend 
the most extreme probabilities into the tails as the 
total sample size builds. For example, the smallest 
nonzero probability could be maintained at approxi- 
mately 0.5/T, and nearby probabilities could be ad- 
justed correspondingly. 

While there are many ways in which IQ estimates 
could be improved, the fact that they are easy to 
explain, easy to interpret, easy to implement, and 
provide useful information about tail behavior, even 
for aggregates over time, users and space, makes IQ 
estimates an attractive choice for monitoring perfor- 
mance and reliability. 
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